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temperatures to evaluate predictive models 

Harry J. Dowsett 1 *, Marci M. Robinson 1 , Alan M. Haywood 2 , Daniel J. Hill 2,3 , Aisling M. Dolan 2 , 
Danielle K. Stoll 1 , Wing-Le Chan 4 , Ayako Abe-Ouchi 4,5 , Mark A. Chandler 6 , Nan A. Rosenbloom 7 , 
Bette L. Otto-Bliesner 7 , Fran J. Bragg 8 , Daniel J. Lunt 8 , Kevin M. Foley 1 and Christina R. Riesselman 1 


In light of mounting empirical evidence that planetary warming is well underway, the climate research community looks to 
palaeoclimate research for a ground-truthing measure with which to test the accuracy of future climate simulations. Model 
experiments that attempt to simulate climates of the past serve to identify both similarities and differences between two 
climate states and, when compared with simulations run by other models and with geological data, to identify model-specific 
biases. Uncertainties associated with both the data and the models must be considered in such an exercise. The most recent 
period of sustained global warmth similar to what is projected for the near future occurred about 3.3-3.0 million years ago, 
during the Pliocene epoch. Here, we present Pliocene sea surface temperature data, newly characterized in terms of level 
of confidence, along with initial experimental results from four climate models. We conclude that, in terms of sea surface 
temperature, models are in good agreement with estimates of Pliocene sea surface temperature in most regions except the 
North Atlantic. Our analysis indicates that the discrepancy between the Pliocene proxy data and model simulations in the 
mid-latitudes of the North Atlantic, where models underestimate warming shown by our highest-confidence data, may provide 
a new perspective and insight into the predictive abilities of these models in simulating a past warm interval in Earth history. 
This is important because the Pliocene has a number of parallels to present predictions of late twenty-first century climate. 


O ur understanding of future climate impacts, as well as our 
ability to adapt to and mitigate effects, relies heavily on the 
tools with which we explore future scenarios. Numerical 
models of the climate system have evolved rapidly over the past 
several decades, in part as a response to the demand for faster 
and more confident projections of future conditions 1 . As humans 
have not yet experienced or been able to measure the magnitude 
of climate change projected for the end of this century, it is 
difficult to assess the performance of computer models. It has 
become common practice to hindcast past climate conditions and 
to verify those efforts using environmental reconstructions based 
on multiple-proxy palaeodata 2 . The confidence we place in the 
palaeoestimates is thus paramount to the understanding of model 
strengths and weaknesses. The most complete reconstruction of 
a past period of global warmth describes the mid-Piacenzian, 
an interval within the Pliocene of sustained warmth ~3. 3-3.0 
million years (Myr) ago, immediately before the intensification 
of large-scale Northern Hemisphere glaciation 3 . Here, we present 
an assessment of the confidence determined for each estimate 
of mean annual sea surface temperature (SST) from 95 sites 
distributed throughout the mid-Piacenzian global ocean. The 
formulation for this confidence assessment is presented in the 
Supplementary Information and all estimates and confidence levels 
are shown in Supplementary Table SI. Most of our mid-Piacenzian 
SST estimates are based on quantitative analysis of planktonic 
foraminiferal faunas from Deep Sea Drilling Project (DSDP) 
and Ocean Drilling Program (ODP) cores 4-6 . Wherever possible, 


additional proxies (alkenone and/or Mg:Ca palaeothermometry) 
provide a more robust understanding of mixed-layer conditions 7 . 
Furthermore, regional and environmental conditions sometimes 
allow the inclusion of other biotic proxies (for example, molluscs, 
bryozoa, diatoms, dinoflagellates, radiolaria and ostracods) that 
enrich our holistic understanding of the palaeoenvironment 8-12 . 

Mid-Piacenzian SST 

Faunal estimates based on planktonic foraminifers were derived 
using either a factor analytic transfer function 13 or a revised modern 
analogue technique 14-16 with modifications to allow for extension 
to Pliocene-age assemblages. Poor carbonate preservation in many 
regions of the Pacific Ocean made siliceous microfossils (diatoms 
and radiolaria) a better choice for quantitative temperature estima- 
tion in some locations 17-20 . Diatoms were used almost exclusively 
in the Southern Ocean where they are excellent indicators of the 
position of the sea-ice margin and Antarctic polar fronts 4,9,20,21 . In 
marginal marine and shallow-water regions, analyses of mollusc 
and ostracod assemblages 22-25 provided additional geographic cov- 
erage for the dataset. 

Wherever possible, independent palaeotemperature methodolo- 
gies using numerous fossil groups were employed to confirm initial 
palaeoenvironmental estimates 7 . We included SST estimates de- 
rived from Mg:Ca ratios in shallow-dwelling planktonic foraminifer 
shells as well as the unsaturation index of alkenones (ketones 
synthesized by haptophyte algae living near the ocean surface) 
found in raw sediment, both of which have been calibrated to SST. 
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At present, 20% of our estimates are based on multiple proxies, 
with the addition of both Mg:Ca and alkenone palaeothermometry 
providing independent proxies for comparison at 14 sites each. In 
total, 24 of the 95 localities have non-faunal temperature proxies, 
helping to establish a more robust understanding of the palaeoen- 
vironmental setting at these sites. 

The present 95 marine locations form a global synthesis 
of an interval of warm and stable climate (relative to high- 
amplitude Pleistocene glacial-interglacial cycles) lying between the 
transition of marine isotope stages M2/M1 (3.264 Myr ago) and 
G21/G20 (3.025 Myr ago; ref. 26) in the middle part of the Gauss 
Polarity Chron. This interval ranges from C2An2r (Mammoth 
reversed polarity) to near the bottom of C2Anl (just above Kaena 
reversed polarity). This ~240kyr time slab correlates in part to 
planktonic foraminiferal zones PL3 (Sphaeroidinellopsis seminulina 
highest-occurrence zone), PL4 ( Dentoglobigerina altispira highest- 
occurrence zone) and PL5 (Atlantic) ( Globorotalia miocenica 
highest-occurrence zone) or PL5 (Indo-Pacific) ( Globorotalia 
pseudomiocenica highest-occurrence zone) 27 . 

It occurs before the onset of high-amplitude oxygen-isotope 
oscillations, which represent a shift towards modern conditions 
(that is, Northern Hemisphere ice volume increased and glacial- 
interglacial variation intensified). Within the bounding positive 
<5 18 0 excursions that mark glacial stages M2 and G20, and 
excepting glacial stage KM2 ~3.1Myr ago, benthic foraminiferal 
oxygen-isotope values in this interval are equal to or isotopically 
lighter than those measured today 26 ’ 28 , making this interval easily 
distinguishable. Even so, a high degree of isotopic variability 
dominated by the 41 kyr period of Earth’s obliquity is evident 
within the time slab 28-30 . 

The establishment and duration of the time slab ('"'240 kyr) was 
originally dictated by limitations in correlating spatially distant data 
sites 29 . Until a sufficient number of data locations with orbitally 
tuned chronologies can be established, a dataset of the mean warm 
phase of climate is a compromise between good spatial coverage 
and correlation potential 31 . To derive an estimate of the mean 
warm phase of climate at each site, we evaluated SST estimates 
in closely spaced time series between 3.264 and 3.025 Myr ago 
and then averaged short-term warm events (see Supplementary 
Information) using a warm-peak averaging technique 29,32,33 . The 
resulting dataset comprises confidence-assessed mean annual SST 
anomalies for verification of climate-model temperature estimates 
at individual locations (Fig. 1). 

Assessing confidence 

Measures of uncertainty or confidence have become an important 
focus of Intergovermental Panel on Climate Change (IPCC) 
reports. In assessing confidence in the SST-verification data 
set, we follow guidance developed by the IPCC to define 
a new confidence metric for evaluating and communicating 
uncertainty that is applicable across a variety of proxies and 
proxy methodologies. As is always the case in palaeoclimate 
studies, some elements of our confidence assessment (for example, 
stationarity of environmental tolerances) are inherently non- 
quantifiable 15,33 and therefore cannot support a meaningful 
calculation of quantitative error. Instead, based on the principles 
outlined by the IPCC 34 , we have developed a semiquantitative 
measure of confidence accounting for quality of the age control 
of the samples at each site, number of samples at each site, 
fossil preservation and abundance, proxy method or technique 
used (quantitative or semiquantitative) and performance of the 
proxy method or technique used (see Supplementary Information). 
As set out by the IPCC, we must invoke expert subjectivity to 
arrive at a level of confidence, relying on our understanding 
of the specifics of individual sites and methods as well as the 
complexity of the dataset as a whole, while providing a traceable 


account of the steps leading to the estimated confidence level. 
This type of confidence assessment can best be made with an 
understanding of both modern and palaeoceanographic settings 
and with an in-depth knowledge of the fossil material and 
methodologies used for analysis. 

The chronological confidence is both a measure of quality of 
age control and how well the Pliocene Research Interpretation 
and Synoptic Mapping (PRISM) interval can be identified in a 
specific sedimentary sequence. An orbitally tuned isotopic record 
or a complete record of palaeomagnetic reversals through the 
Pliocene earns a higher confidence. Conversely, a small number 
of biochronologic events, fossil first or last appearances calibrated 
to the palaeomagnetic timescale in another region, earns lower 
confidence in the absence of local palaeomagnetic stratigraphy. 

We also weight the assessment based on the number of samples 
analysed within the time slab, fossil preservation and abundance. 
In short, more samples equate to a better chance of characterizing 
temperature variability across the time slab. Similarly, more 
specimens equate to a higher probability that rare elements of 
the fauna or flora are represented. At least 300 specimens per 
sample are required to attain statistical confidence and avoid 
potential bias in the assemblage 35 . Even in samples with high 
abundances, preservation is important because dissolution can 
selectively remove less-robust species, altering the assemblage and 
thereby producing biased temperature results. In general, warm- 
water species of foraminifera are most fragile and their preferential 
dissolution drives temperature estimates cooler. 

Methods of determining SST vary with fossil group and region 
and therefore are hard to compare. A quantitative estimate is 
considered better than a qualitative estimate simply because it is 
reproducible. The performance of the method is considered as part 
of the confidence assessment. For example, the performance of 
transfer functions is based on communality, the measure of how 
well the fossil assemblage can be described by the modern factors 
(assemblages). Performance of the modern analogue technique is 
based on the multivariate distance between the fossil assemblage 
and its closest modern analogues, and the agreement in temperature 
among those closest analogues 36 . Temperature estimates from 
geochemical methodologies (that is Mg:Ca and alkenone) are 
included or excluded based on sample characteristics that may 
limit analysis. For example, an alkenone-based SST estimate is not 
included if the total alkenone concentration of that sample falls 
below a threshold of reproducibility. 

Different palaeotemperature proxies measure different aspects 
of temperature by sampling the marine environment at various 
spatial and temporal resolutions, further complicated by effects 
unique to each signal carrier and method. Therefore, our multiple- 
proxy analysis is done on a site-by-site basis, taking into account 
the full range of palaeoenvironmental information derived from a 
complete assessment of a fossil assemblage and allied geochemical 
proxies, to determine the overall quality of the temperature 
estimate 37,38 . Slight differences between multiple-proxy estimates 
from a single site strengthen the confidence of the overall site 
estimate, compared with an estimate from a single proxy. 

Very-high-confidence (VHC) example. DSDP site 552 in the 
North Atlantic is among the group of sites attaining the highest 
confidence level because every piece of information, proxy 
and method is available in excellent condition and in good 
agreement. Chronology at this site is based on palaeomagnetics 39 , 
with all magnetochrons recorded in the sediment core, key 
biostratigraphic events and a high-resolution oxygen-isotope 
record clearly indicating the mid-Piacenzian interval 40 . Planktonic 
foraminifera in 22 samples are abundant and well preserved, 
and transfer-function communalities are very high (0.894-0.991; 
ref. 41). SST estimates are provided by multiple proxies with 
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Figure 1 1 Confidence estimates and temperature anomalies from PRISM dataset, a, Distribution of verification dataset sites; symbols indicate region 
(Atlantic, bullets; Pacific, squares; Indian, triangles; Arctic, pentagons) and colours indicate level of confidence, b, Mean annual SST anomalies at 95 
confidence-assessed Pliocene localities; see Supplementary Table SI. 


Mg:Ca and alkenone-derived estimates, indicating mean annual 
conditions, bracketed by faunal cold- and warm-season estimates 7 . 

High-confidence example. At ODP site 677 in the eastern 
equatorial Pacific, alkenone-derived SST estimates from 28 samples 
fall between faunal based cold- and warm-season estimates from 
16 samples. The modern analogue technique performs well on the 
foraminifer assemblages, returning multivariate distances between 
fossil and modern assemblages of less than 0.28. However, 
chronology at this site is based solely on integrated biochronology 42 , 
and is therefore less confidently resolved than at site 552. 

Medium-confidence example. SST estimates from site E36-33 in 
the Southern Ocean are based on diatom assemblages and their 


relationship to modern assemblages and summer temperatures 
near the Antarctic polar front, ft is because this palaeother- 
mometry method is semiquantitative that this site earns a 
medium, rather than high, confidence rating. The diatoms 
from this site are abundant with good to excellent preserva- 
tion and the chronology is based on a fully resolved record 
of palaeomagnetic reversals augmented by five key biostrati- 
graphic events 43 . 

Low-confidence example. A single site in the SST-verification data 
set (ODP site 754 in the Indian Ocean) is categorized as low 
confidence and is included in the verification dataset solely for 
the purposes of comparison with higher levels of confidence. At 
site 754, chronology is based on only two biostratigraphic events 44 . 
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Figure 2 | Data and model mean annual temperature profiles, a, Verification dataset SST estimates superimposed on zonally averaged multimodel mean 
annual SST from four climate-model simulations shown as grey band with width equal to ±2 a . Symbols indicate ocean basin; colours indicate level of 
confidence. Zonally averaged modern mean annual SST (ref. 46) shown as dashed line, PRISM3 reconstruction 3,6 shown as solid line, b, same as in a 
except North Atlantic region only, with ±2tr SST variability bars on deep-sea sites with better than 22 kyr sample resolution and multiple warm peaks in 
the SST-estimate time series. 


The planktonic foraminifera from 30 samples are abundant 
and well preserved. When assemblage data were applied to a 
newly developed Indian Ocean transfer function, however, the 
resulting communalities ranged from 0.03 to 0.20. This very poor 
performance by the transfer function indicates a situation for which 
there is no analogue and the SST estimates provided by the transfer 
function command little confidence. 

Results 

Twenty-seven of the 95 localities were ranked with VHC and 32 with 
high confidence. All ocean basins contain sites with confidence lev- 
els ranging from VHC to medium confidence. The regional distri- 
bution of confidence in Pliocene SST estimates is shown in Fig. 1 a. 

The highest percentage of VHC sites is found in the North 
Atlantic Ocean where they are confined to latitudes below 60° N. 
VHC sites form a transect from the Caribbean Sea (DSDP site 502) 
to the northeast Atlantic (DSDP sites 548, 552 and ODP site 610). 
These VHC sites illustrate an ever-increasing temperature anomaly 
with increasing latitude (Fig. lb). This warming trend is extended 
further north to the Arctic Ocean by including high-confidence site 
907 and medium-confidence sites 909 and 911. 

In the North Pacific Ocean, a combination of VHC and high- 
confidence sites documents both Pliocene warmth and the path 
of the Kuroshio current (Fig. 1). In the Southern Hemisphere, 
although no VHC sites are found south of ~45° S, a high-confidence 
site is found as far as 77° S in the Ross Sea. The circumpolar 
distribution of medium-confidence sites in the Southern Ocean 
(medium confidence because the reconstruction method is not 
quantitative) lends robust support to the poleward displacement 
of the Antarctic polar front zone and decreased sea-ice distribution 
during the mid-Piacenzian relative to present-day conditions 19 . 

As our highest-confidence sites are not geographically restricted, 
they can serve as a starting point for formulating global hypotheses. 
For example, upwelling zones in the North, equatorial and South 
Pacific, and in the North Atlantic off North Africa, are represented 
by VHC sites showing warmer-than-modern SST. This indicates a 
commonality among mid-Piacenzian upwelling regions and argues 
for a system-wide phenomenon of warmer nutrient-rich upwelling 
waters across the globe. 


Comparison of the multimodel mean to Pliocene SST 

To assess the performance of Pliocene climate-model simulations, 
we compare zonally averaged multimodel mean annual surface 
temperature results from four leading climate models (the National 
Center for Atmospheric Research CCSM4; the Hadley Centre 
HadCM3; the University of Tokyo/Japan Agency for Marine-Earth 
Science and Technology MIROC4m and NASA Goddard Institute 
for Space Studies GISS-ER) all initialized and run using an identical 
experimental design and protocol 45 (see Supplementary Table 
S2) to the confidence-assessed SST-verification data (Fig. 2). The 
comparison also includes zonally averaged modern SST (ref. 46) 
and the globally interpolated PRISM3 SST reconstruction 3 . 

The overall shape of the pole-to-pole temperature profile 
is nearly identical across modern, PRISM3 reconstruction and 
the multimodel mean representations of the surface ocean. The 
similarity of the zonally averaged PRISM3 reconstruction to the 
others is instructive as it emphasizes the commonalities between 
Pliocene, modern and near-future climates, relative to warm 
periods of the more distant past. The models are qualitatively in 
close agreement with each other in the Southern Hemisphere, but 
less so in the Northern Hemisphere. 

A PRISM3 gridded Pliocene-SST reconstruction was created 
by extrapolating from and interpolating between a subset of the 
verification data (86 of the 95 sites) 3 ; hence, individual site data 
representing local conditions should not be expected to fall squarely 
on the zonally averaged PRISM3 gradient. For example, site 532 
registers significantly cooler mean annual SST than the overall 
reconstruction or the models because this site monitors upwelling 
in the Benguela current and is therefore cooler than other estimates 
at the same latitude. Most upwelling sites show similar cool offsets. 
The verification data points, particularly the VHC sites, should 
instead be used to identify gradients in the data not mirrored 
in the zonal gradients. 

The models show good agreement with the data in the Southern 
Hemisphere and in the Southern Ocean. Likewise, the 2<r variability 
about the multimodel mean brackets the verification data points 
in the North Pacific (Fig. 2a). Low-latitude warming away from 
upwelling regions has long been a feature of simulated Pliocene 
climate primarily owing to increased C0 2 used in these modelling 
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Figure 3 | Data-model comparison, a, Multimodel mean (MMM) annual SST anomaly (°C), Pliocene minus pre-industrial, b, Standard deviation of 
modelled SSTs. c, Anomaly as a minus PRISM3 SST anomaly, d, Difference between the multimodel mean pre-industrial SST prediction and HadlSST 
dataset (averaged between 1870 and 1900) showing biases for the pre-industrial era. e, Same as c except removing pre-industrial biases shown in d. 
f, Scatter plot showing no correlation between model differences to HadlSST and model differences to PRISM3 SSTs. Red crosses highlight North Atlantic 
data sites south of Iceland. 


experiments. The VHC verification sites, although showing a range 
of low-latitude temperature, clearly support warmer tropics. 

One clear difference between the model simulations and the 
verification data is immediately apparent. Neither the multi- 
model mean nor outputs from any of the individual models 
(Supplementary Figs S2 and S3) capture the amount of warming 
shown by the VHC verification data in the North Atlantic (Figs 1 
and 2b). These sites clearly show increased anomalies with increas- 
ing latitude, probably amplified by reduced sea ice in the Northern 
Hemisphere and associated positive feedbacks, but the models do 
not capture the decreasing thermal gradient documented by multi- 
ple proxies even when variability in the Pliocene estimates is taken 
into account (Fig. 2b). This may be linked to differences in the po- 
sition of the Gulf Stream-North Atlantic Drift in the models com- 
pared with the geological reconstruction. The models show a high 
degree of variation in the North Atlantic south of Iceland (~ 10 °C 
in the mean annual SST at 2er ; see also Supplementary Fig. S3 ) . 

Previous analyses of model performance have identified limi- 
tations in the ability to reproduce the pre-industrial-era 1 SST. In 
this Pliocene comparison, however, the observed discrepancy in 


the North Atlantic is larger and of a different nature than the 
error observed in pre-industrial simulations carried out by the 
same models. An analysis of model performance in simulating 
reconstructed Pliocene versus observed pre-industrial North At- 
lantic SSTs indicates that the discrepancy noted in our Pliocene 
analysis is probably not solely associated with errors in the simulated 
pre-industrial state, but instead may provide new insights into 
the predictive abilities of these models, above and beyond what is 
already known (Fig. 3). 

Implications and outlook 

The IPCC has traditionally focused on multimodel mean climate 
scenarios to depict future climate conditions and has developed 
protocols to define confidence in those projections 1 . We have taken 
the same concepts and adapted them to a comprehensive set of 
SST estimates based on multiple proxies from the last sustained 
period of global warmth similar to that projected for the end of 
the present century. The resulting confidence-assessed verification 
SST dataset represents the highest concentration of data focused 
on a period significantly warmer than the pre-industrial period 


NATURE CLIMATE CHANGE | VOL 2 | MAY 2012 | www.nature.com/natureclimatechange 


369 






ARTICLES 


NATURE CLIMATE CHANGE DOI: 10.1038/NCLIMATE1455 


and offers a test to Pliocene climate simulations. More notably, 
the verification dataset can be used to assess model behaviour 
regionally and at specific locations. Regional or site-specific levels 
of confidence provided by the verification dataset can be used as 
guidance for climate-simulation improvement. 

The verification dataset is in many ways similar to multimodel 
mean climate scenarios of the same time period produced by four 
leading climate models. However, the confidence-assessed verifica- 
tion data show warming in the North Atlantic and Arctic oceans 
beyond what is simulated by the climate models used, implying 
more warmth from ocean heat transport than the models repro- 
duce. Furthermore, the verification dataset (and faunal assemblage 
and geochemical data it is derived from) recognizes a system-wide 
phenomenon of warmer nutrient-rich upwelling regions during the 
mid-Piacenzian that is not present in the multimodel mean. 

Although this analysis has identified data/model discord in the 
North Atlantic, we recognize inherent uncertainties in modelling 
Pliocene SSTs. For example, at present we are unable to constrain 
with certainty a number of critical forcing mechanisms and bound- 
ary conditions that climate models require to simulate Pliocene 
SSTs with the same degree of skill that they are able to simulate mod- 
ern SSTs (for example, atmospheric C0 2 concentrations as well as 
other trace gases such as CH 4 , aerosol and dust loading, and orbital 
forcing). Reducing uncertainties wherever possible and thereby im- 
proving our skill in simulating regional characteristics of Pliocene 
climate is of prime importance to our understanding of this poten- 
tial window into late twenty-first century climate. We consider our 
characterization of Pliocene SST data in terms of confidence levels a 
critical step in reducing uncertainties in the ground-truth data. 

Received 29 November 2011; accepted 15 February 2012; 
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